Techniques For Modelling Phonological Processes In Automatic Speech Recognition

نویسنده

  • Harriet Jane Nock
چکیده

Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of pronunciation in conversational speech contributes to these differences, the dissertation then focuses on techniques for modelling the phonological processes underlying pronunciation change. Such techniques may be classified as explicit (operating at or close to the level of the word pronunciation dictionary) or implicit (operating at or close to the subword statistical models of the acoustic signal) and both types are considered. An existing explicit technique, motivated by linear phonology and originally evaluated on a dictated speech task, has recently been extended for conversational speech tasks. Rather than model pronunciations using phonemic units (which are by definition abstract units with highly variable acoustic realizations), a statistical mapping is constructed from the abstract phonemic units to their context-dependent realizations as surface phonetic units (which are by definition less abstract and less variable in acoustic realizations). If the map from phonemic units to phonetic realizations is sufficiently accurate, the task of modelling the acoustic realizations of words should be simplified. Small but statistically significant performance improvements can be obtained on the SWITCHBOARD transcription task. However, further experiments by the author and by other researchers suggest that schemes modelling pronunciation change in terms of speech “segments” have only limited potential. This analysis suggests a more implicit approach capable of describing variable degrees of pronunciation change at levels below the segment may be more appropriate. This motivates investigation into a family of statistical models that could form the basis of such an approach: Loosely-coupled or Factorial Hidden Markov Models (FHMMs). The theory of FHMMs is described and it is then shown that they generalize several standard speech models. Two specific FHMMs are investigated. Analysis of an existing FHMM in the literature the Mixed-Memory Assumption FHMM finds it has potential weaknesses for speech modelling. This leads us to propose a new FHMM the Parameter-Tied FHMM which makes fewer a-priori assumptions about the data to be modelled. Estimation and decoding of FHMMs is potentially computationally expensive, so approximate algorithms are also developed. Empirical studies using the ISOLET speech classification task show (1) FHMMs scale to speech modelling tasks (2) the Parameter-Tied FHMM achieves performance comparable to the Mixed-Memory Assumption FHMM for speech modelling and (3) identify an approximate algorithm for decoding and estimation that is adequate for more extensive experimentation. A short study using the TI DIGITS task shows that FHMMs can be scaled to continuous speech recognition whilst continuing to achieve classification performance competitive with more conventional models. The thesis ends with a summary and possible directions for future research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Verification of phonological rules recognitio

I introduce a way of verifying phonological processes on the basis of phonetic substances obtained by automatic speech recognition. The acoustic characteristics of phone-like units are modelled using automatic speech recognition techniques. This phone recogniser is run on the data tokens whose segmental structure matches with the context of target processes to be verified. Examining output stri...

متن کامل

Improving Automatic Phonetic Transcription of Spontaneous Speech Through Variant-Based Pronunciation Variation Modelling

In this paper we present an experiment aimed at improving automatic phonetic transcription of Dutch spontaneous speech through a variant-based method of pronunciation variation modelling. For spontaneous speech, the literature does not always provide enough rules to describe its characteristic phonological processes. Therefore, other methods should be applied to model pronunciation variation fo...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Evidence of phonological processes in automatic recognition of children's speech

Automatic speech recognition (ASR) for children’s speech is more difficult than for adults’ speech. A plausible explanation is that ASR errors are due to predictable phonological effects associated with language acquisition. We describe phone recognition experiments on hand labelled data for children aged between 5 and 9. A comparison of the resulting confusion matrices with those for adult spe...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001